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Abstract —Caching at small base stations (SBSs) has demon¬ 
strated significant benefits in alleviating the backhanl reqnire- 
ment in heterogeneons cellnlar networks (HetNets). While many 
existing works focns on what contents to cache at each SBS, 
an equally important problem is what contents to deliver so 
as to satisfy dynamic nser demands given the cache status. In 
this paper, we study optimal content delivery in cache-enabled 
HetNets by taking into acconnt the inherent multicast capability 
of wireless medium. We consider stochastic content multicast 
scheduling to jointly minimize the average network delay and 
power costs nnder a multiple access constraint. We establish a 
content-centric request queue model and formnlate this stochastic 
optimization problem as an infinite horizon average cost Markov 
decision process (MDP). By using relative value iteration and 
special properties of the request queue dynamics, we characterize 
some properties of the value function of the MDP. Based on these 
properties, we show that the optimal multicast scheduling policy 
is of threshold type. Then, we propose a strnctnre-aware optimal 
algorithm to obtain the optimal policy. We also propose a low- 
complexity suboptimal policy, which possesses similar structural 
properties to the optimal policy, and develop a low-complexity 
algorithm to obtain this policy. 

Index Terms —Heterogeneous cellular networks, wireless 
caching, content-centric, multicast, Markov decision process, 
structural properties, queueing. 


I. Introduction 

The rapid proliferation of smart mobile devices has trigged 
an unprecedented growth of the global mobile data traffic, 
which is expected to reach 24.3 exabytes per month by 2019 
(E. One promising approach to meet the dramatic traffic 
growth is to deploy small base stations (SBSs) together with 
traditional macro base stations (MBSs) in a heterogeneous 
network paradigm 0. Such a heterogeneous cellular network 
(HetNet) provides short-range localized communications by 
bringing base stations (BSs) closer to users, and hence in¬ 
creases the area spectral efficiency and network capacity. How¬ 
ever, the main drawback of this approach is the requirement of 
expensive high-speed backhaul links for connecting all SBSs 
to the core network. The backhaul capacity requirement can 
be enormously high during peak traffic hours. 

Recently, caching at BSs has been proposed as an effective 
way to alleviate the backhaul capacity requirement and im¬ 
prove the user-perceived quality of experience in wireless net¬ 
works a-Gi. Caching has received significant attention in the 
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literature ISl- lfT^ . Specifically, in E, the authors introduce 
the concept of FemtoCaching and study content placement at 
SBSs to minimize the average content access delay. In E, 
the authors consider joint request routing and content caching 
for HetNets and propose approximate algorithms to minimize 
the content access delay. In mni, the authors study the joint 
optimization of power and cache control for video streaming in 
multi-cell multi-user MIMO systems. References ED and ED 
study outage probability and average delivery rate of cache- 
enabled HetNets for given caching strategies. However, jSl- 
ED consider point-to-point unicast transmission for cache- 
enabled wireless networks and can only help to reduce the 
backhaul burden without effectively relieving the “on air” 
congestion. Furthermore, in ||8l- E2l . the inherent broadcast 
nature of wireless medium is not fully exploited, which is 
the major distinction of wireless communications from wired 
communications. 

Enabling multicasting at BSs is another efficient approach 
to deliver popular contents to multiple requesters concurrently. 
Wireless multicasting has been specified in 3GPP standards 
known as evolved Multimedia Broadcast Multicast Service 
(eMBMS) E3l . In view of the benefits of caching and 
multicasting, joint design of the two promising techniques is 
expected to achieve superior performance for massive content 
delivery in wireless networks. From an information-theoretic 
perspective, El proposes a novel coded caching scheme to 
minimize the peak traffic load for a single-cell network by uti¬ 
lizing multicast transmission and caches at users, and charac¬ 
terizes the memory-rate tradeoff. Note that, lfT4l only considers 
the delay-insensitive services. However, many content-centric 
applications, such as video steaming, are delay-sensitive, and 
it is critical to consider delay performance in cache-enabled 
content-centric wireless networks El-ESl. In specific, the 
authors in El study coded multicasting for inelastic services 
(with strict deadline) in a single-cell network, and propose 
a computationally efficient content delivery algorithm under 
a given coded caching scheme. In Ebl . the authors propose 
an approximate caching algorithm with performance guarantee 
and a heuristic caching algorithm, to reduce the service cost 
for inelastic services in a cache-enabled small-cell network 
under a given multicast transmission strategy. In ED, the 
authors consider multicasting for inelastic services in a cache- 
enabled multi-cell network, and propose joint throughput- 
optimal caching and scheduling algorithms to maximize the 
service rates of the inelastic services. In our recent work ED, 
we consider optimal multicast scheduling to jointly minimize 
the average delay and service costs of elastic services (delay- 
sensitive services but without strict deadlines) for a cache- 
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enabled single-cell network. However, for elastic services, it 
remains unknown how to design optimal multicast scheduling 
to jointly minimize average delay and service costs for given 
cache placement in cache-enabled HetNets. The main chal¬ 
lenge of determining optimal multicast scheduling in HetNets 
stems from the heterogeneous structure of the network (see a 
motivating example in Section Hl-Db . 

In this paper, we consider a cache-enabled HetNet with one 
MBS, N SBSs, K users, and M contents (with possibly differ¬ 
ent content sizes). The SBS coverage areas are assumed to be 
disjoint. Assume that the MBS and the SBSs are not allowed 
to operate concurrently, to avoid excessive interference, while 
the SBSs are allowed to operate at the same time without 
mutual interference. This is referred to as the multiple access 
constraint. Each SBS is equipped with a cache storing a certain 
number of contents, depending on the sizes of the cached 
contents and the cache size. The MBS stores all contents in 
the network. In each slot, each BS either schedules one cached 
content for multicasting to serve the pending requests from 
the users in its coverage area, or keeps idle, i.e., does not 
transmit any content. We consider stochastic content multicast 
scheduling to jointly minimize the average network delay and 
power costs under the multiple access constraint. We establish 
a content-centric request queue model and then formulate this 
stochastic multicast scheduling problem as an infinite horizon 
average cost Markov decision process (MDP) 03. There are 
several technical challenges involved. 

• Optimality analysis: The infinite horizon average cost 
MDP is well-known to be challenging due to the curse of 
dimensionality. Although dynamic programming provides a 
systematic approach for MDPs, there generally exist only 
numerical solutions. These solutions do not typically offer 
many design insights and are usually impractical due to the 
curse of dimensionality ED. Thus, it is highly desirable to 
study the structural properties of the optimal policy. There 
are several existing works on structural analysis for hybrid 
systems ISOj-Ell. However, the system models and queueing 
models in these works significantly differ from ours, and hence 
the approaches and results therein cannot be straightforwardly 
extended to our work. Specifically, our problem can be viewed 
as a problem of scheduling a single broadcast server (the MBS) 
or multiple multicast servers (the SBSs) to parallel (request) 
queues with general arrivals. Existing works have only studied 
the problems of scheduling a single broadcast server to parallel 
queues mi, EH, El. Therefore, the structural analysis of 
the optimal multicast scheduling of a single broadcast server 
or multiple broadcast servers to parallel queues with general 
arrivals remains unknown and cannot be straightforwardly 
extended from the existing solutions. 

• Algorithm design: Standard numerical algorithms such 
as value iteration and policy iteration to MDPs are usually 
computationally impractical for real systems. To reduce the 
complexity, several structured optimal algorithms, which in¬ 
corporate the structural properties into standard algorithms, are 
proposed ll25l . ED- However, the curse of dimensionality still 
remains an issue. On the other hand, the structural properties 
of the optimal policy may be one key reason for its good 
performance. Thus, it is highly desirable to further reduce 


the complexity of the structured optimal algorithms, while 
maintaining similar structural properties to the optimal policy. 

By using relative value iteration algorithm (RVIA) im 
Chapter 4.3.1] and special properties of the request queue 
dynamics, we characterize some properties of the value func¬ 
tion of the MDP. Based on these properties, we show that 
the optimal multicast scheduling policy, which is adaptive 
to the request queue state, is of threshold type. This reveals 
the tradeoff between the average delay cost and the average 
power cost. Then, we propose a structure-aware optimal al¬ 
gorithm by exploiting the structural properties of the opti¬ 
mal policy. To further reduce the computational complexity, 
using approximate dynamic programming C3, we propose 
a low-complexity suboptimal policy, which possesses similar 
structural properties to the optimal policy, and develop a 
low-complexity algorithm to compute this policy. Numerical 
examples verify the theoretical results and demonstrate the 
performance of the optimal and suboptimal solutions. 

The rest of this paper is organized as follows. Section 
II introduces the network model. Section III provides the 
formulation of the stochastic multicast scheduling problem and 
the optimality equation. The structural properties of the opti¬ 
mal policy are presented in Section IV and a structure-aware 
optimal algorithm is proposed in Section V. In Section VI, 
we propose a low-complexity suboptimal solution. Numerical 
results are provided in Section VII. Einally, we conclude the 
paper and provide several future directions in Section VIII. 
The important notations used in this paper are summarized in 
Table U 


TABLE I: List of important notations 



set of all SBSs 

M = A/'+ U {0} 

set of all BSs; BS 0: MBS 

/Co 

set of users not covered by any SBS 

lCn,n e Af+ 

set of users within coverage area of SBS n 

II 

C 
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set of all users 

M 

set of all contents 

Mn,n € A/ 

set of contents cached at BS n 

■Nm 

set of SBSs that caching content m 

n, fc, m, t 

BS, user, content, slot index 

p(n, m) 

minimum transmission power required by BS n to 
deliver content m to all associated users within a slot 

-A. = 

request arrival matrix 

Q = (Qn,m) 

request queue state matrix 

S(Q) 

sum request queue length 

U — {^71)71^^ 

multicast scheduling action 

u 

stationary multicast scheduling policy 

vm 

value function 

J(Q,n) 

state-action cost function 


H. Network Model 

Consider a cache-enabled HetNet with one MBS, N SBSs, 
K users, and M contents, as illustrated in Eig. [T] Let 
A/^ = {0,1, 2, • • • , N} denote the set of all BSs, where BS 
0 refers to the MBS and BS n = 1,2, • • • , iV refers to SBS 
n. Let = {1,2,-•• ,N} denote the set of N SBSs. 
The SBS coverage areas are assumed to be disjoint. Let 
/C = {1,2, - - ,K} denote the set of K users in the network. 
Let C K. denote the set of users within the coverage area 
of SBS n G ■ All the users in /C„ (n G A/"’*') can be 
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Fig. 1; Cache-enabled heterogeneous cellular network. 

served by the MBS and SBS n. Let /Cq = /C — UngAA+ 
denote the set of users not covered by any SBS. All the users 
in ICq can only be served by the MBS. Note that, we do 
not distinguish the users in /C„. Let Af = {1,2, 
denote the set of M contents (with possibly different content 
sizes) in the network. Each BS is equipped with a cache 
storing a certain number of contents, depending on the cache 
size and the sizes of the cached contents. Let C A1 
denote the set of cached contents in BS n S Af. We assume 
Ado = M, i.e., the MBS stores all contents in the network. 
Let Mm = {n G M~^\m G Mn\ denote the set of SBSs 
caching content m G Ml. We assume that the contents stored 
in the caches are given according to certain caching strategy 
(similar assumptions have been made in the literature, e.g. 
CD, CD, CD) and consider multicast scheduling for a given 
caching design. Notice that caching in general is in a much 
larger time-scale (e.g., on a weekly or monthly basis) while 
multicast scheduling is in a shorter time-scale a, 0, Cl. 
Consider time slots of unit length (without loss of generality) 
indexed by t = 1,2, • • •. In the sequel, we first introduce the 
request arrival model, the service model, and the request queue 
model. Then, we provide a motivating example that highlights 
the challenges in designing the optimal multicast scheduling. 

A. Request Arrival Traffic 

In each slot, each user submits content requests to the MBS. 
Let An,m{t) G {0,1, • • • } denote the number of the requests 
for content m from all users in /C„ which arrive during slot t, 
where n gM and m G Ml. Let A(f) = {An,m{t))neAr,meM 
denote the request arrival matrix during slot t. We assume that 
the request arrival processes (n G M,m G Ml) are 

mutually independent with respect to n and m; and {A„ ^(f)} 
are i.i.d. with respect to t for all n G M and m G Ml. Note 
that, the content request arrivals are modeled according to 
the Independent Reference Model (IRM), which is a standard 
approach adopted in the literature 0, ifTTll . lIZTll . The MBS 
maintains separate request queues for each BS n G M and 
each cached content m G Min. The request queue model will 
be further illustrated in Section III-CI 

B. Service Model 

We consider multicast service for content delivery in the 
network. For ease of illustration, we assume that each BS 


can transmit at most one content in each slot. The analytical 
framework and results can be extended to the general case 
in which each BS can transmit multiple contents in one slot. 
In each slot, each BS n G M either schedules one cached 
content for multicasting to serve the pending requests from all 
users in its coverage area, or keeps idle (i.e., does not transmit 
any content). We consider the Zero Download Delay (ZDD) 
assumption ll^ . i.e., all scheduled contents in one slot can 
be delivered to the users within the same slot. Let p{n,m) 
denote the minimum transmission power required by BS n 
for successfully delivering one cached content m to all users 
in its coverage area within a scheduling slot, where n G M 
and m G Min. We set p(n, 0) = 0 for all n G M. If BS 
n multicasts content m with transmission power p{n, m), all 
pending requests for content m from all users in the coverage 
area of BS n are satisfied. Let u„(t) G Min = M4n U {0} 
denote the scheduling action of BS n G M at slot t, where 
Un{t) ^ 0 indicates that BS n multicasts the cached content 
Un{t) with transmission power p{n,Un{t)) at slot t, and 
Un{t) = 0 indicates that BS n does not transmit any content at 
slot t. Let u{t) = {un(t))n^j\f denote the multicast scheduling 
action in the network at slot t. 

The MBS is assumed to operate at a much higher transmis¬ 
sion power level than the SBSs, for providing full coverage 
of the network. To avoid excessive interference, we therefore 
do not allow the MBS and the SBSs to operate concurrently. 
On the other hand, since the SBSs are spatially separated and 
use much lower powers, we allow the SBSs to operate at the 
same timewithout mutual interference. Mathematically, for all 
t, we require, 

Uo{t) ^ Unit) = 0. (1) 

neAA+ 

We refer to O as the multiple access constraint. Let U = 
{iUn)n€/G\Un G Mn Vu G M and UoSriGA^+'“" = 
denote the feasible multicast scheduling action space. The 
network power cost p(u) associated with u gU is given by 

Pi'^) - (2) 

C. Request Queue Model 

As illustrated above, for each SBS n G M^, the requests 
for cached content m G M4n from all the users in /C„ can 
be served by both the MBS and SBS n, while the requests 
for uncached content m G Mio \ Min can only be served by 
the MBS. On the other hand, the MBS can serve the requests 
for any content m G Mio from the users in /C„. Therefore, 
the request queues maintained by the MBS are constructed 
as follows. For each SBS n G M~^ and each cached content 
m G Adn, we construct a separate request queue, referred to as 
queue {n,m), storing the requests for content m from all the 
users in A„. Let Qn,mit) denote the length of queue (n, m) 
at the beginning of slot t, where n G M'^ and m G Min. For 
the MBS n = 0 and each content m G Mig, we also construct 
a separate request queue, referred to as queue (0,m), storing 
the requests for content m from all users in UneA^+\A^m 
(the set of users covered by the SBSs where content m is 
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not cached) and /Cq. Let Qo^rn{t) denote the length of queue 
(0,m) at the beginning of slot t, where m £ Mq. Note 
that these request queues can be implemented using counters 
and no data is contained in these queues. For each queue 
(n, m), let Nn,m denote the upper limit of the corresponding 
counter. For technical tractability, we assume that Nn^m is 
finite (can be arbitrarily large). This assumption is to guarantee 
that the request queue state space is finite, which would 
greatly simplify the mathematical arguments lfT9l . Let Qn,m = 
{0,1,-- - denote the request queue state space for 

queue {n,m). Let Q(f) = S Q denote 

the request queue state of the network at the beginning of 
slot t, where Q = JlneA^ denotes the request 

queue state space and the operation denotes the Cartesian 
product. 

For each SBS n € A/"’*' and each cached content m £ Ad„, 
all the pending requests in queue (n, m) are satisfied, if content 
m is scheduled for multicasting by the MBS (i.e., Mo(f) = m) 
or by SBS n (i.e., M„(f) = m) at slot t. Thus, for each n £ 
and m £ Ad„, the request queue dynamics is as follows: 

Qn.mit + 1) = min{l(uo(f) ^ m & ^ rn)Qn,m{t) 

+ -An^rn(t),Nn^m}, ( 3 ) 

where !(•) denotes the indicator function. 

For the MBS and each content m £ AAq, all the pending 
requests in queue (0, m) are satisfied, if content m is scheduled 
for multicasting by the MBS at slot t (i.e., uo{t) = m). Thus, 
for the MBS and each content m £ A4q, the request queue 
dynamics is as follows: 

1) — min^ 1 (uq (f) ^ (f) L -^o.m ; 

(4) 

where io.m(i) = + Ao,m(t) denotes 

the total number of requests for content m from all users in 
UriGA^+XA/'m ^0 which aixive during slot t. Note that 

each request is stored in only one queue. 

D. Motivating Example 

As illustrated in Fig. |2] consider a network with 1 MBS, 2 
SBSs {N'^ = {1, 2}), 3 users (/C = {1, 2, 3}), and 3 contents 
{M = {1,2,3}). We set /Ci = {1}, IC 2 = {2}, /Co = {3}, 
Ml = {1,2}, M 2 = {2,3}, and Mq = {1,2,3}. According 
to Section ITl-CI the MBS maintains seven request queues, i.e., 
queues (0,1), (0,2), (0,3), (1,1), (1,2), (2,2), and (2,3). 
Our goal is to design the optimal multicast scheduling so 
as to jointly minimize the average network delay cost and 
power cost. This involves two challenging and coupled tasks. 
First, at each time slot, it is not clear whether to operate the 
MBS or the SBSs. If we schedule the MBS to multicast, then 
the pending requests for one content in the whole network 
can be satisfied with a higher power cost, e.g., clear queues 
(0, 2), (1, 2), and (2, 2) with power p(0, 2). If we schedule the 
SBSs to multicast, the pending requests for (possibly different) 
contents in different SBS coverage areas can be satisfied with 
a lower power cost, e.g., clear queues (1,2) and (2,3) with 
power p(l, 2) +p(2, 3). 



Fig. 2: An example with 1 MBS, 2 SBSs, 3 users and 3 
contents. 

Second, at each time slot, if BS n £ A/^ is allowed to operate, 
it is unknown whether to keep BS n idle or not, and which 
content to schedule for multicasting if BS n is not idle. Take 
SBS 1 as an example. Suppose at certain time slot t we have 
Qi.i{t) > Qi, 2 {t) andp(l, 1) > p(l, 2). Scheduling Content 1 
can satisfy more requests with a higher power cost; scheduling 
Content 2 can satisfy fewer requests with a lower power cost; 
keeping SBS 1 idle consumes zero power cost. 

rom this example, we can see that, the challenges in 
designing the optimal multicast scheduling come from the 
heterogeneous structure of the network, and the difficulty in 
balancing the delay cost and the power cost. In the sequel, we 
formalize the multicast scheduling problem and try to tackle 
these challenges. 

III. Problem Formulation and Optimality 
Equation 

A. Problem Formulation 

Given an observed request queue state Q, the multicast 
scheduling action u is determined according to a stationary 
policy defined below. 

Definition 1 (Stationary Policy): A feasible stationary mul¬ 
ticast scheduling policy /i is a mapping from the request queue 
state Q £ Q to the feasible multicast scheduling action u £ W, 
where /r(Q) = u. 

By the queue dynamics in Q and (|4|, the induced random 
process {Q(t)} under policy p is a controlled Markov chain 
with the following transition probability: 

Pr[Q'|Q, u] 4 Pr[Q(f + 1) = Q'|Q(i) = Q, u(f) = u] 

= E [Pr [Q(t + 1) = Q'IQ(t) = Q, u(/) = u, A(t) = A]], 

(5) 

where the expectation is taken over the distribution of the 
request arrival A and 

Pr [Q(/ + 1) = Q'IQ(f) = Q, u(t) = u, A(t) = A] 

1, if Q' satisfies Q and (|4]i 
0, otherwise 

We restrict our attention to stationary unichain policiesQ 
For a given stationary unichain policy p, the average network 

* A unichain policy is a policy, under which the induced Markov chain has 
a single recuiTent class (and possibly some transient states) (I9l 
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delay cost is defined as 

1 ^ 

d(/i) = limsup-VE[d(Q(f))], ( 6 ) 

where d (Q) = Y^neAf Sme 7 W„ ‘3™.™ ^^e expectation 

is taken with respect to the measure induced by the random 
request arrivals and the policy ji. According to Little’s law, 
(i(/r) reflects the average waiting time in the network under 
policy /i. For a given stationary unichain policy p, the average 
network power cost is given by 

1 T 

p{fi) = lim sup - V E [p(u(f))]. (7) 

T-^oo 7 

In this paper, we would like to jointly minimize the average 
network delay cost and the average network power cost. We 
adopt the weighted-sum method, which is a commonly used 
method for multiobjective optimization problems ll28l . Specifi¬ 
cally, we define the average network cost as the weighted sum 
of the average network delay cost and the average network 
power cost, i.e., 

1 ^ 

g{fi) = d{fi) + wpifi) =limsup-^E[ 5 (Q(t),u(f))], 
T-^oo 7 

( 8 ) 

where p(Q, u) = d (Q) + wp{u) is the per-stage network cost 
and w is the weight indicating the relative importance of the 
average power cost over the average delay cost. Note that, w 
can also be treated as the penalty factor, mimicking the soft 
average delay constraint. In other words, w can be thought 
of a Lagrange multiplier on the average delay cost constraint 

ED. 

We wish to And an optimal multicast scheduling policy to 
minimize the average network cost p(p) in ®. 

Problem 1 (Network Cost Minimization): 

1 ^ 

g* = minlimsup-^E[p(Q(f),u(f))], (9) 

where p is a stationary unichain policy in Definition [T] and g* 
denotes the minimum average network cost achieved by the 
optimal policy 

Notice that, if u> = 0, Problem [T] reduces to the delay 
minimization problem, which can still be covered by our 
framework. However, if the power minimization is the sole 
goal, this optimization problem is no longer meaningful, as 
all BSs would keep idle and no requests would be served. In 
Problem [H we assume the existence of a stationary unichain 
policy achieving the minimum in (|9]). Later in Lemma [T] we 
shall prove the existence of such a policy. Problem [T] is an 
infinite horizon average cost MDP, which is challenging due 
to the curse of dimensionality. 

^Restricting the search for the optimal policy to unichain policies is to 
guarantee the existence of stationary optimal policies. This is widely used in 
the literature, e.g., (To), da, do) 


B. Optimality Equation 

The optimal multicast scheduling policy p* can be obtained 
by solving the following Bellman equation. 

Lemma 1 (Bellman Equation): There exist a scalar 0 and a 
value function V{-) satisfying: 

9 + ViQ) = min {^(Q, u) -f E [L(Q')]} , VQ £ Q, (10) 

uGU 

where the expectation is taken over the distribution of the 
request arrival A, and Q' = {Q'^ YneN,meMn with = 
min{l(rto m)Qo,m + ^o.m, Aq,™} for all m £ Mq and 
= min{l(Mo 7 ^ m & 

for all n £ and m £ Af„. Furthermore, 9 = g* is the 
optimal value to Problem [T] for all initial state Q(l) £ Q and 
the optimal policy achieving the optimal value g* is given by 

p*(Q)=argmin{p(Q,u)+E[y(Q')]}, VQ £ Q. (11) 

udU 

Proof: Please see Appendix A. ■ 

From Lemma [T] we observe that p* given by (fTTIi depends 
on Q through the value function L(-). Obtaining V{-) involves 
solving the Bellman equation in (fTOl i for all Q £ Q, which 
does not admit a closed-form solution in general m .Standard 
numerical solutions such as value iteration and policy iteration 
are usually computationally impractical to implement, and do 
not typically yield many design insights Km. Thus, it is highly 
desirable to study the structural properties of the optimal 
policy p*. 

IV. Optimality Properties 

Problem 1 can be viewed as a problem of scheduling a 
single broadcast server (the MBS) or multiple multicast servers 
(the SBSs) to parallel (request) queues with general arrivals. 
The structural analysis is more challenging than the existing 
structural analysis for the scheduling of a single broadcast 
server. First, by RVIA and the special structure of the request 
queue dynamics, we can prove the following property of the 
value function. 

Lemma 2 (Monotonicity of V (Cl)): For any Q^,Q^ £ Q 
such that ^ Q^, we have L(Q^) > V(Q^)EI 

Proof: Please see Appendix B. ■ 

Next, we introduce the state-action cost function: 

J(Q,u)4p(Q,u)+E[l/(Q')]. (12) 

Note that J(Q,u) is related to the R.H.S. of the Bellman 
equation in (fTOl i. Then, based on J(Q,u), we introduce: 

Au,v(Q) = J(Q,u)-J(Q,v). (13) 

Note that Au,v(Q) = —Av,u(Q)- Action u is said to domi¬ 
nate V at state Q if Au.v(Q) < 0. In particular, by Lemma [T] 
if u dominates all v £ Z/f at state Q, then /r*(Q) = u. Based 
on Lemma |2] we have the following property of the function 
defined in (fl^ . 

Lemma 3 (Monotonicity 0 /Au,v(Q))-‘ For any Q £ Q and 
u, V £ Z/, Au v(Q) has the following properties. 

^The notation ^ indicates component-wise >. 
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1) If Mo S Mq, then Au.v(Q) is monotonically non¬ 
increasing with Qo.m and Qn,m for all n € Nm, where 
m = mq. 

2) If Un S Mn for some n € J\f^, then Au,v(Q) is mono¬ 
tonically non-increasing with Qn,m, where m = Un- 

Proof: Please see Appendix C. ■ 

Lemma [3 indicates that, if u dominates v at some state 
Q, then by increasing (5o,«o or Qn,uo for any n e Muo 
and Mo 0 , or by increasing Qn,un for any n € and 
M„ ^ 0, u still dominates v. The properties of Au.v(Q) in 
Lemma |3] are similar to the diminishing-return property of 
submodular functions used in the existing structural analysis 
lISTl . Lemma [3] stems from the special properties of multicas¬ 
ting and is essential to characterize the optimality properties. 
By Lemma 12 we can characterize the structural properties of 
the optimal policy /r*. We start with several definitions. Define 

^u(Q—n, —m) — {Qn,m|f?n,m ^ Qn,m and 

Au,v(Qn.m, Q-n-m) < 0 Vv S W and V 7 ^ u}, (14) 


Algorithm 1 Structured Policy Iteration Algorithm 

1 : Set ^o(Q) = f* foi" Q G Q, select reference state Ql, 
and set I = 0 . 

2 : (Policy Evaluation) Given policy fj,*, compute the average 
cost 9i and value function Vi(Q) from the linear system 
of equation^ 

0i + Vi{Q)= g{Q, + E [f)(Q')], VQ £ Q 

Vi{Q^)=0 

( 20 ) 

where Q' is defined in Lemma [T] 

3: (Structured Policy Improvement) Obtain a new policy 
M*+i’ where for each Q £ Q, /i*+i(Q) is such that: 
if 3n £ A/" and Q' £ Q such that /r*(Q') = u, m„ £ Mn, 
Qn,m < Qn,m, and for all {i,j) {n,m), 

where m = Un then 

Mr+i(Q) = u. 

else 


where C^—n,—m — Based on 

$u(-)’ we define: 






max$u (Q —n, —m): if $u(Q-n,-m) 7^ * 


Otherwise 


(15) 


minif 4*u(Q—n,—m) 7 ^ 


-1-00, 


otherwise 


(16) 


Let 0 denote the 1 x {N -|- 1) vector with all entries 0. Then, 
we have the following theorem. 

Theorem 1 (Structural properties of fi*): For any Q £ Q, 
the optimal policy p,* has the following structural properties. 

1 ) p*(Q) = 0 for all Q £ Qo = {Q|Q«.m < 

fo Vm £ Mn and n £ Af}. 

2) If 3n £ A/”, such that m* £ Mn, then /r*(Q) = u* for 
all Q £ Q such that 


Mr+i(Q) = argmin{p(Q,u) -hE[Vi(Q')]}. 

endif 

4: Go to Step 12] until 


multicast content 1 when Qi i is large. The reason is that the 
MBS can satisfy more requests than any SBS. These optimality 
properties provide design insights for multicast scheduling in 
practical cache-enabled HetNets. 

V. Structure-Aware Optimal Algorithm 

The results in Theorem [T] can be exploited to substan¬ 
tially reduce the computational complexity for solving the 
Bellman equation in (fTOl i in obtaining p*. In particular, by 
Property 2 in Theorem [T] for all Q £ Q, u G U and 
Q' = (Qn,m)nGAr,nieM„ Satisfying 


Qn,m > fu4Q-n-m), (17) 

where m = m*. Moreover, (Q_o,-m) is monotoni¬ 
cally non-increasing with Qn^m for all n £ A/"m- 
Proof: Please see Appendix D. ■ 

We illustrate the analytical results of Theorem [T] in Fig. [3| 
where the optimal policy is computed numerically using policy 
iteration algorithm (PIA) ll^ Chapter 8 . 6 ]. We observe from 
Fig. |3(a)| that, if the queue state falls in the region of blue 
squares (i.e., Qo)j the optimal control is (0,0), i.e., both the 
MBS and the SBS keep idle. Hence, we refer to Qo as the idle 
region of the optimal policy. From Fig. |3(b)P(d)'| we observe 
that given the scheduling for content m £ A4„ by 

BS n £ A/" is of threshold type (Property 2 of Theorem[T]i- This 
indicates that, it is not efficient to schedule content m by BS n 
when Qn,m is small (i.e., the delay cost is small), as a higher 
power cost per request is consumed. This shows the tradeoff 
between the delay cost and the power cost. Fig. |3(c)| illustrates 
the monotonically non-increasing property of </>“. (Q_o,-i) in 
terms of Qip. This reveals that the MBS is more willing to 


Qn,m — Qn,m, if tin tn /ion 

f -L ^ 

^n,m — ^n,mt if ^ 

for each n G M and m £ Mn, we have 

M*(Q) =U ^ p*(Q') = U. (19) 

Therefore, by incorporating the property in (fT9] l into the 
standard PIA, we develop a structure-aware algorithm in 
Algorithm [T| which is referred to as the structured policy 
iteration algorithm (SPIA). According to Theorem 8 . 6.6 and 
Chapter 8.11.2 in ll32l . we know that SPIA converges to the 
optimal policy p* in (fTTT i within a finite number of iterations, 
and hence is an optimal algorithm. 

Note that, in Step [3] (structured policy improvement) of 
Algorithm [T| we do not need to perform the minimization 
over U when the condition is satisfied (which is the case for a 
large amount of queue states in Q). This can be seen in Fig. |3] 

"^The solution to ]loJ can be obtained directly using Gaussian elimination 
or iteratively using the relative value iteration method tnj. 
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(a) Whole space. 


(b) Qo.i = 0. 


(c) Qq ,2 = 5. 


(d) gi,i = 1. 


Fig. 3: Structure of the optimal multicast scheduling. N = 1, 1C = {1, 2, 3,4}, A4 = {1, 2}, )Ci = {1, 2}, and A4i = {!}. In 
each slot, each user requests one content, which is content 1 with probability 0.6 and content 2 with probability 0.4. 


While, in the standard policy improvement step of PIA, the 
new policy /r*^]^is obtained by: 

pr+i(Q) = argmin{g(Q,u)+E[f^(Q')]}, VQ e Q. ( 21 ) 

u^U 

By (EB, obtaining p*_|_i requires a brute-force minimization 
over U for each Q G Q, which can be very computationally 
expensive when the numbers of the contents M and the SBSs 
N are large. By comparing the structured policy improvement 
step of SPIA with the standard policy improvement step of 
PIA, we can see that SPIA can achieve considerable compu¬ 
tational saving. Note that, the complexity of SPIA depends 
on the specific algorithm used for solving the linear system 
equations in (l20l i. For instance, using Gaussian elimination 
for solving (l20l i. the complexity of each iteration in SPIA is 
0(|Q|3 + |Z^||Q|2) 1^. 

Although the proposed structure-aware optimal algorithm, 
i.e., SPIA, can alleviate the computational burden of the 
standard PIA, it still suffers from the curse of dimensionality 
GU, due to the exponential growth of the cardinality of the 
system state space (|Q| = nngwnmg.M„ 12".™!)- When 
the number of the SBSs and the cache sizes are large, the 
resulting huge state space may render SPIA computationally 
impractical. 


VI. Low Complexity Suboptimal Solution 

To further reduce the complexity of the proposed structure- 
aware optimal algorithm (i.e., SPIA) and relieve the curse 
of dimensionality, we would like to develop low-complexity 
suboptimal solutions. Note that the structural properties of 
the optimal policy may be one key reason that leads to 
good performance. Thus, in this section, we focus on design 
of suboptimal solutions which can maintain the structures 
of the optimal policy. Specifically, based on a randomized 
base policy, we first propose a low-complexity suboptimal 
deterministic policy using approximate dynamic programming 
ESI. We show that the deterministic policy improves the 
randomized base policy and possesses similar properties to the 
optimal policy. Then, we design a low-complexity structured 
algorithm to compute the proposed policy, by exploiting these 
structural properties. 


A. Low Complexity Suboptimal Policy 

The structural properties of the optimal policy come from 
the monotonicity property of the value function. Therefore, to 
maintain these structures in designing a suboptimal solution, 
we consider a value function decomposition method that can 
preserve the structural properties of the value function. Based 
on this decomposition, we propose a low-complexity subop¬ 
timal deterministic policy, which will be shown to possess 
similar structural properties to the optimal policy. We first 
introduce a randomized base policy. 

Definition 2 (Randomized Base Policy): A randomized 
base policy for the multicast scheduling control p is given 
by a distribution on the feasible multicast scheduling action 
space U. 

We restrict our attention to randomized unichain base poli¬ 
cies. Let 9 and f^(Q) denote the average cost and the value 
function under a randomized unichain base policy p, respec¬ 
tively. By EH Proposition 4.2.2], there exists (0, {V(Q)}) 
satisfying: 

9 + V{q)=E^[g{Q,u)]+ EqPr[Q'|Q,u]]l/(Q'), 

Q'eC 

VQ e Q, (22) 


Next, we show that P(Q) has a additive separable structure 
in the following lemma. 

Lemma 4 (Additive Separable Structure ofV{Cl)): 

Given any randomized unichain base policy p, the 
value function {P(Q)} in (l22Ti can be expressed 
as^ t>(Q) = T,nGArT,mGMr,Vn,miQn,m), where 

{Vn,miQn,m)} Satisfies: 


4“ Ln,m(l5n,m) — [l7n,m(Qn,m; u)] 

+ 'y E^ u]] ^n,miQn,m)y 




VQn,m C Qn,r. 


(23) 


for all n € A/" and m G A4„. Flere, and Vn,miQn,m) 
denote the per-BS-content average cost and value function 
under p, respectively, 9 n,m(Qn,m,u) = Qn,m + wl{un = 
m)p(n,m), and PT[Q'„ .^\Qn,m,u] = Pr[Q„,„(f-f 1) = 
2n,ml2n,m(f) — — tl]- 

Proof: Please see Appendix E. ■ 





















To alleviate the curse of dimensionality, we approximate the 
value function in ( fTOl i with T^(Q): 

ViQ) « ViQ) =Y. Y. Km{Qn,m), (24) 

n^Af m^AAn 

where {Vn^m(Qn,m)} is given by the per-BS-content fixed 
point equation in (|2^ . Then, we develop the following low- 
complexity deterministic policy ft*: 


A*(Q) = argmin 

uGU 


0 (Q,U)+^ Y E Vn,m{Qn,m) 

I n^Af mGAAn 

VQ G Q. (25) 


In the following proposition, we show that the deterministic 
policy fi* generated by (l25l l always improves the correspond¬ 
ing randomized unichain base policy ft. 

Proposition 1 (Performance Improvement): 9* > 6, where 
0 is the average network cost under a randomized unichain 
base policy ft and 0* is the average network cost under the 
proposed solution. 

Proof: This result follows directly from 1^ . ■ 

Note that, to obtain ft* in (l25T l via solving (|2^ for all n G A/^ 
and m G Mn, we only need to compute {Vn,miQn,m)} 
(a total of 0(J2n€ArJ2meMr. I2n,m|) values). The compu¬ 
tational complexity is much lower than computing {V^(Q)} (a 
total of \Qn,m\) values) via solving ([lOll 

in obtaining p* in (fTTI) . 


B. Structural Properties of Suboptimal Policy 

In this part, we investigate the structural properties of the 
suboptimal policy ft* in (|25] |. Along the lines of the structural 
analysis for the optimal policy in Section lTVl we first introduce 
the state-action cost function for ft*: 


J(Q,u) = p(Q,u) 


n^A/ m^AAn 


E 


(Qn,m ) 


(26) 


Note that J(Q,u) is related to the R.H.S. of (|25]) . Then, we 
introduce: 


Au,v(Q) = i(Q,u)-J(Q,v), (27) 

4^u(Q —n,—m) — {Qn,m\Qn,m G Qn,m and 

Au,v(Qn.m, Q-n,-m) < 0 Vv G and V 7 ^ u}. (28) 

By replacing <!)(•) with $(•) in (fTSl l and (fTfil l. we have 
and <))„(•). respectively. In the following theorem, we show 
that the proposed deterministic low-complexity suboptimal 
policy possesses similar structural properties to the optimal 
policy. This similarity would be one key reason for the good 
performance of the proposed suboptimal policy, as will be 
shown in Section I VII I 

Theorem 2 (Structural properties of ft*): For any Q G Q, 
the optimal policy ft* has the following structural properties. 

1 ) /i*(Q) = 0 for all Q G Qo = {QIQn.m < 
ft (Q-n,-m), Vm G Mn and n G TV}. 

2) If 3n G Af, such that u* G Mn, then /i*(Q) = u* for 
all Q G Q such that 

Qn,m > (29) 


where m = u*. Moreover, (Q-o.-m) is monotoni- 
cally non-increasing with Qn,m for all n GAT™. 

Proof: Please see Appendix F. ■ 


C. Structured Suboptimal Algorithm 

By employing the relation between ft and ft* as well 
as the structural properties of ft* in Theorem |2] we de¬ 
velop a low-complexity algorithm, referred to as the struc¬ 
tured suboptimal algorithm (SSA), to obtain ft* in (|25]) . 
as summarized in Algorithm |2 For SSA, it requires only 
one iteration to obtain ft*. The complexity depends on the 
specific algorithm used for solving (|2^ . For instance, using 
Gaussian elimination for solving (|2^ . the complexity of SSA 
is \Qn,m\^ + mQn 


Algorithm 2 Structured Suboptimal Algorithm 

1 : Given a randomized base unichain policy ft, compute the 
per-BS-content value function {Vn,m{Qn,m)} for all n G 
Af and m G Adn by solving the linear system of equations 
in dlUlEI 

2: Obtain the deterministic policy ft*, where for each Q G Q, 
ft*{Q) is such that: 

if 3n G A/" and Q' G Q such that = u, G Mn, 

Q'n,m < Qn,m, and for all {i,j) {n,m), 

where m = Un then 

fi*{Q) = u. 


else 

endif 


Compute /i*(Q) using ( |25] ). 


Now, we compare the computational complexity of SSA and 
SPIA. SSA is similar to one iteration of SPIA. As discussed in 
Section rVI-AI in Step[T]of SSA, the number of value functions 
required to be computed is much smaller than that in each 
iteration of the policy evaluation step (Step [3 in Algorithm [T]i 
of SPIA. In Step |2] of SSA, the number of optimizations 
required to solve is comparable to that in each iteration of 
the structured policy improvement step of SPIA. Therefore, 
SSA has a significantly lower computational complexity than 
SPIA. 


VII. Numerical results and discussions 

In this section, we evaluate the performance of the proposed 
optimal and suboptimal solutions through numerical examples. 
In the simulations, we assume that in each slot, each user 
requests one content (independently), which is content m with 
probability Pm. We assume that {Pm} follows a (normalized) 
Zipf distribution with parameter a m. For simplicity, we 
assume that each content is of the same size and each SBS has 
the same cache size. These assumptions are commonly used 
in the literature on wireless caching in HetNets, e.g., |f8l- lfT2ll . 
As the coverage areas of the SBSs are assumed to be disjoint, 

^The solution to (ID can be obtained directly using Gaussian elimination 
or iteratively using the relative value iteration method QD 
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(a) Average network cost. 



(b) Average delay cost. 



(c) Average power cost. 


Fig. 4: Average network, delay, and power costs versus weight w. 


we adopt a commonly used content placement strategy, i.e., 
each SBS stores the most popular contents 0, im, M- 

First, we compare the average costs of the proposed optimal 
and suboptimal solutions with two baseline policies, i.e., a 
randomized base policy in Definition |2] and a greedy policy. 
In particular, for the randomized base policy, in each slot, we 
randomly select the MBS or all the SBSs to operate, with 
probability Pmbs and 1 — Pmbs, respectively. Then, if BS 
n is allowed to operate, it keeps idle with probability P^i^ or 
schedules one cached content m G for multicasting with 
probability P” = (1 - P'^^i^Pm/Yum^M^P^n- Note that, 
the suboptimal policy is obtained based on this randomized 
base policy, as illustrated in Section |VT] In the following 
simulations, we set Pmbs = 0.5 and P^^i^ = 0.3 for all 
n G JV. To illustrate the greedy policy, we first introduce its 
cost function C(Q,u) = StigaT Cn(Q, Wn), where Q G Q, 
u G Z//, and 

Co(Q,Mo) 

A j ru) Qo,m ^^^nGA/in ^ ^ 

“ |o ifuo = 0 

^ f wp{n, m) - Qn,m if Un=m e M„ ^ 

^0 if = 0 

For the greedy policy, in each slot, we choose the multi¬ 
cast scheduling action that minimizes the cost function, i.e., 
u(t) = argminu(t)gtY C" (Q(Z), u(f)). This greedy policy can 
also be treated as an approximate solution to Problem [T] 
through approximating V(Q) with Q in (fTTl i. Note that this 
policy determines the scheduling action myopically, without 
accurately considering the impact of the action on the future 
costs. This type of policy is a commonly used baseline policy 
in the literature (see e.g., and references therein). 

Fig. a illustrates the average network cost, delay cost, and 
power cost versus the weight of the power cost (i.e., w) for a 
network with 1 MBS, 1 SBS, 4 users, and 3 contents (M = 
{1,2, 3 }). We set a = 0 . 75 , |/Co| = |/Ci| = 2, Mo = {1,2, 3 }, 
M\ = {!}, p{0,m) = 4 for all m G Mq, andp(l,7Ti) = 2 for 
all m € Ml- It can be observed in Fig |4(a)| that the average 
network costs of the proposed optimal and suboptimal policies 
are very close to each other, and are lower than those of the 


randomized base policy and the greedy policy. The reason is 
that the proposed two policies can make foresighted decisions 
by better utilizing system state information and considering the 
immediate cost as well as the future costs. From Fig |4(b)| and 
Fig |4(c)| we can see that, as w increases, the average power 
costs of the proposed two policies decrease, at the expense of 
the average delay costs. This reveals the tradeoff between the 
average delay cost and the average power cost. 

In Fig. |5] Fig. |6] and Fig. |7] we investigate the impacts of 
the Zipf parameter, the cache size of SBSs, and the number of 
users on the performance of the proposed suboptimal policy 
and the two baseline policies, respectively. We consider a 
network with 1 MBS, 2 SBSs and 20 contents. We set w = 3, 
Mo = M, |/Co| : |/Ci| : I/C 2 I = 1:2:2, p(0, m) = 30 for all 
m G Mo, and p{n, m) = 3 for n = 1, 2 and m G Mn- Here, 
|/Co| : |/Ci| : I/C 2 I indicates the ratio of the numbers of the 
users in /Cq, /Ci, and /C 2 . From Fig.|5]-|7] it can be seen that 
the proposed suboptimal policy outperforms the two baseline 
policies in terms of the average network cost. 

Fig . ID illustrates the average costs versus the Zipf parameter 
a for the aforementioned three policies. The a parameter 
determines the “skewness” of the content popularity distribu¬ 
tion, i.e., a large a indicates that a small number of contents 
account for the majority of content requests. We observe 
that for the proposed suboptimal policy, as a increases, the 
average network cost decreases and better delay performance 
can be achieved with less transmission power. This reveals 
that the proposed suboptimal policy can utilize caching more 
effectively as the content popularity distribution gets steeper. 

Fig. IS] illustrates the average costs versus the cache size of 
SBSs |A4s|. For the proposed suboptimal policy, we observe 
that with the increase of |A4s|, the average network cost 
decreases (slightly) and the average power cost decreases 
without sacrificing the delay performance. In addition, for 
the greedy policy, when |Afs| increases, the average power 
cost decreases, however, at the expense of the increase of 
the average delay and network costs. The reason is that, 
when more contents are cached in SBSs, the transmission 
opportunities of SBSs increase and the proposed suboptimal 
policy can utilize these opportunities more properly than the 
greedy policy. We also notice that for the greedy policy, when 
|A^ 5 | increases, the average network and delay costs increase 
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Fig. 5: Average network, delay, and power costs versus Zipf parameter a. K = 30 and \M.i\ = \A42\ = 10. 





(a) Average network cost. (b) Average delay cost. (c) Average power cost. 

Fig. 6: Average network, delay, and power costs versus cache size of SBSs |Ads|. |Ads| = |A^i| = IAI 2 I, K = 30, and 
a = 0.75. 






(a) Average network cost. 


(b) Average delay cost. 


(c) Average power cost. (d) Average network cost per user. 


Fig. 7: Average network cost, delay cost, power cost, and network cost per user versus number of users K. a = 0.75 and 
\Mi\ = \M2\ = 10. 


and the average power cost decreases. The intuitive reasons are 
as follows. First, the cost function C'(Q,u) is not equivalent 
to the original objective function in Problem [T] Second, 
under the simulation settings, when \A4s\ increases, i.e., the 
transmission opportunities of the SBSs increase, the greedy 
policy would be more likely to schedule the SBSs, which leads 
to certain reduction of the average power cost but much larger 
increase of the average delay cost, i.e., the increase of the 
average network cost. Therefore, this reveals that the greedy 
policy could not properly utilize the transmission opportunities 
offered by the increase of cache sizes. 

Fig. Q illustrates the average costs versus the number of 
users K. From Fig. |7(a)||7(c)] it can be seen that with the 


increase of K, the average network, delay, and power costs 
of the proposed suboptimal policy increase. The reason is 
that, when the average request arrival rate increases (as K 
increases), there will be more requests waiting for service and 
the BSs are more willing to operate instead of keeping idle. 
From Fig. |7(d)| we can see that with the increase of K, the 
average network costs per user of all policies decrease. This 
reflects the benefit of multicast. 

Next, we compare the computational complexity of the stan¬ 
dard optimal algorithm (PIA), the proposed structure-aware 
optimal algorithm (SPIA) and the proposed low-complexity 
suboptimal algorithm (SSA). Table HI] illustrates the compu¬ 
tational time comparison for a network with 1 MBS, 1 SBS 
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TABLE II: Average computation time (sec) for different algorithms. 


Algorithms 

\Mo\=2,\Mi\ = 1 

|34ol = 3,|Ali| = 1 

|A4o| =3,|>1i| = 2 

PIA: 

1321 Chapter 8.6] 

evaluation 

3.283 

28.955 

330.88 

improvement 

mis 

1055 

18735 

total 

5385 

48.091 

5T80 

SPIA: 

Algorithm ^ 

evaluation 

3.288 

28.436 

318.63 

improvement 

rra 

8.887 

88.360 

total 

on 

37325 

406.99 

SSA: 

Algorithm [2] 

Step 


0.008 

0.015 

0.054 

step i 

1 

0.223 

3.859 

36.240 

total 

0.231 

3.874 

36.294 


and 4 users. The simulation is carried out on a Windows x86 
machine with a dual-core 2.93 GHz Intel processor and 4GB 
RAM using Python 3. We set |/Co| = \K.i\ = 2, w = 1, 
a = 0.75, Nn,m = 4 for all n G A/” and m G Ain- It can be 
seen that SPIA has much lower computational complexity than 
PIA, with a reduction of about 50% in the policy improvement 
step and a reduction of about 20% in the total algorithm. 
Moreover, we observe that the computation times of PIA and 
SPIA grow rapidly with the cardinality of the system state 

space (IQI = rineA^IlmGAt^ \Qn,ni\ = while 

the computation time of SSA grows almost linearly with |Q|. 
The computational savings of SSA compared with PIA and 
SPIA are significant. These verify the discussions in Section 
fy] and Section |VT] 

VIII. Conclusion 

In this paper, we study optimal content delivery strategy in a 
cache-enabled HetNet by taking into account the inherent mul¬ 
ticast capability of wireless medium. We establish a content¬ 
centric request queue model and then formulate a stochastic 
content multicast scheduling problem to jointly minimize the 
average network delay and power costs under a multiple access 
constraint. This stochastic optimization problem is an infinite 
horizon average cost MDP We show that the optimal multicast 
scheduling policy, which is adaptive to the request queue 
state, is of threshold type. Then, we propose a structure-aware 
optimal algorithm to obtain the optimal policy by exploiting 
its structural properties. To further reduce the complexity, we 
propose a low-complexity suboptimal policy, which has similar 
structural properties to the optimal policy, and develop a low- 
complexity algorithm to obtain this policy. 

This work opens up several directions for future research. 
First, this work focuses on multicast scheduling for given 
cache placement. It would be interesting to investigate the 
joint optimal design of content delivery and cache place¬ 
ment/replacement. Second, in this work, IRM is used to model 
the content request arrivals. It is of particular importance 
to consider more practical request traffic models. Finally, in 
this work, we assume that the coverage areas of the SBSs 
are disjoint and the SBSs can operate concurrently without 
mutual interference. It is also of interest to take into account 
the interference management in cache-enabled HetNets under 
more general topology and interference models. 

Appendix A: Proof of Lemma[T| 

By Proposition 4.2.5 in HD, the Weak Accessibly (WA) 
condition holds for unichain policies. Thus, by Proposition 


4.2.3 in 03, the optimal average network cost of the MDP 
in Problem [T] is the same for all initial states. In addition, 
by Proposition 4.2.1 in EH, we know that the solution 
(0, {F(Q)}) to the following Bellman equation exists. 

e + V{Q) = mm\g{Q,u)+ ^ Pr[Q'|Q, u]F(Q') I 

VQ G Q. (30) 

By substituting the transition probability Pr[Q'|Q, u] given in 
© into (EOll, we have Coll, which completes the proof. ■ 

Appendix B: Proof of Lemma[2] 

We prove Lemma |2] using RVIA and mathematical induc¬ 
tion. 

First, we introduce RVIA di Chapter 4.3]. For each state 
Q G Q, let V;(Q) be the value function in the Lth iteration, 
where I = 0,1,- -. Define the state-action cost function in 
the Z-th iteration as 

Ji+,iQ,u)^giQ,u)+E[ViiQ% (31) 

where p(Q,u) and Q' are given in ® and Lemma [T] 
respectively. Note that J/+i(Q,u) is related to the R.H.S. of 
the Bellman equation in (fTOl i. For each Q, RVIA calculates 
Vi+i(Q) according to 

Vi+i(Q) = min J;+i(Q,u) - min Ji+i(Q^u), VI, (32) 

uGU u£U 

where G Q is some fixed state. Under any initialization of 
Vo(Q), the generated sequence {Vi(Q)} converges to U(Q) 
lfT9l Proposition 4.3.2], i.e., 

limkj(Q)=U(Q), VQgQ, (33) 

l—¥00 

where V (Q) satisfies the Bellman equation in (fTOl i. Let g,* (Q) 
denote the control that attains the minimum of the first term 
in (I32I 1 in the Z-th iteration for all Q, i.e., 

/r^Q) = argmin Ji+i(Q,u), VQ G Q. (34) 

uGU 

Define g*{Q) = (/t*„(Q))„GA^, where /r;*„(Q) denotes the 
control action of BS n for state Q. We refer to /i* as the 
optimal policy for the Z-th iteration. 

Next, we prove Lemma |2] through induction using 
RVIA. Denote = iQi,m)neJ\r,niGMr^ and = 
{Qn m)u.eA^,mGA4„- To prove Lemma|2] it is sufficient to show 
that for any Q^, G Q such that ^ Q^, 


( 35 ) 
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holds for all / = 0,1, • • •. Note that, and can lie on 
the boundaries of the state space Q. 

First, we initialize Vo(Q) = 0 for all Q € Q. Thus, we have 
^o(Q^) = ^(Q^) = 0, i.e., (ITST i holds for I = 0. Assume that 
(l35l l holds for some I > 0. We will prove that dTSl l also holds 
for l + l.By (l32li, we have 

i/i+i(Q') = Ji+i (Q\/2r(Qi)) - j,+i(Qt,pnQ^)) 

< Ji+I (Q\/rr(Q2)) - J,+i(Qt,/,;(Qt)) 
E[VliQ^')]+d{Q^) + wp{^,:{Q^)) - 

(36) 

where (a) follows from the optimality of /r*(Q^) for 
in the Z-th iteration, (b) directly follows from ( IbTI i. and 

= (Qn,m)nGAr,meM„ with = min{ 1 (p*g(Q^) ^ 

+ ^o,m,^o,m} for all m G Mo and = 

for all n G and m G Mn- By OD and (l32T i. we also have 

Vi+i(Q^) = Ji+i (Q^A^r(Q')) - Ji+i(Qt,/rr(Qt)) 

= E[f^(Q2')] + d(Q2) + wpifiUQ")) - Ji+i{Q\pKQ^)), 

(37) 

where Q^' = (Q^'™)nG7^,meAr„ with = 

min{l(/r;*p(Q2) 7^ m)Ql^^ + Aq,™, Wq,™} for all m G Mq 
and = min{l(p*p(Q2) ^ m & dlniQ^) ^ 

'^)Qn,m + An,m,Nn,m.} for all Ti G J\f'^ and m G Mn- 

Then, we compare (l36l l and dJTl i term by term. 

Due to Q" ^ Q\ we have ^ , implying 

that E[Vi(Q^ )] > E[V)(Q^ )] by the induction hypothesis. 
In addition, since d{Q) is a monotonically non-decreasing 
function of Q, we have (i(Q^) > d(Q^). Thus, we have 
Vi+i(Q^) > Vi+i(Q^), i.e., drsl l holds for I -f 1. Therefore, 
by induction, we can show that ( |35] | holds for any 1. By taking 
limits on both sides of dTSl l and by (l33t . we complete the proof 
of Lemma |2] ■ 


Appendix C: Proof of Lemma[3] 

First, we derive the general relationship between Au v(Q^) 
and Au v(Q^) for any u, v G 77 and any Q^, G Q, which 
will be used to prove the two properties in Lemma |3 By ( fT3] l, 
for any u, v G 77 and any Q^, G Q, we have 


A,.v(Q 1 )-A,,v(Q 2) 

= (d(Qi) + wpiu) + E[L(Qi^")] - c7(Qi) - wp{v) 

- E[L(Qi-'^)]) - (d(Q2) + wp{u) + E[L(Q2’")] 
-d(Q2)-u;p(v)-E[L(Q2'")]) 

=E[L(Qi’")] - E[L(Qi’'')] - E[L(Q2’")] -p E[L(Q2’'')], 

(38) 


where 


Qq^ = min{l(ito ^ + Ao,m, 7Vo,m}, Vm G Mo 

Qn^m — nrin{l(uo 7^ TTI & Un n,Tn d~ ^n^nit ^n,m} ^ 


Vn G N'~^, Vm G Mn 
(39a) 

QI'J^ = min{l(?;o 7 ^ 'rn)Ql „, + Ao,m, No,m}, '^rn G Afo 
= min{l(uo ^ m k Vn ^ m)Qi,Tn + ^n,m,Nn,m}, 

Vn G Vm G Mn 

(39b) 

Qo;" = min{l(ito ^ rn)Qg + Ao,m, 7Vo,m}, Vm G Mo 

Qn^m — min{l(uo 7^ m k Un n,ni d~ ^n^nit ^n^ni\ 

Vn G Vm G Mn 

(39c) 

Qo;^ = min{l(?;o 7 ^ + Aq,™, Aq.™}, Vm G Afo 

Qn,m — min'[l('Uo 7^ m k Vn 7^ n,m A : 

Vn G Vm G Mn 

(39d) 


Next, based on (l38l l, we prove Property 1 in Lemma |3] 
Suppose action u satisfies that wq = j G AIq. and and 
satisfy the following relation: 



if n G {0} U Afj and m = j, 
otherwise. 


(40) 


for each n € JV and m £ Mn- By comparing ( I39ab with (I39cl l. 
we can see that Qn^ = Qn^ for all n G A" and m G Ain, 
i.e., Q^’" = Q^’“. Thus, we have E[L(Q^’“)] = E[L(Q^’“)]. 
By comparing (I39bl i with (I39dl i. we can see that ^ 

Thus, by Lemma |2] we have E[L(Q^’'")] > E[A(Q^’'^)]. 
Therefore, by (l3^ . we have Au.v(Q^) < Au,v(Q^), which 
completes the proof of Property 1 in Lemma [3 

Finally, based on (l38T l. we prove Property 2 in Lemma |3 
Suppose action u satisfies that Ui = j £ Mi for some j £ 
A^+, and and satisfy the following relation: 



(41) 


if n = 7 and m = j, 
otherwise. 


for each n £ Af and m G Mn- Similarly, by comparing ( I39ab 
with (I39cb and comparing ( I39bb with ( I39db . we can see that 
Qi " = " and ^ Therefore, by Lemma |2] and 

(l38b . we have Au,v(Q^) < Au,v(Q^), which completes the 
proof of Property 2 in Lemma [3 B 


Appendix D: Proof of Theorem[T] 

We first prove Property 1 of Theorem [T] Consider multicast 
scheduling action u = 0, BS n G Af, content m G Mn, 
another action v = {vi)i^f-j- £ 77 where Vn = m, and 
request queue state Q = C Q where 

Qn,m — 4^0 (Q — n,—m)- Note that, if (^Q (Q—n, — 'm) — C>0, 

Property 1 of Theorem [T] always holds. Therefore, in the 
following, we only consider 4>q (Q-n,-m) > — 00 . According 
to the definition of in (fTST i, we can see that, 

Au,v(Q) < 0, i-e., u dominates v at state Q. Now consider 
another queue state Q' = G Q where 

Q'n,m < Qn,m and Q[j = Qij for all (7, j) 7 ^ (n,m). Since 
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Au,v(Q) = —Av,u(Q), by Lemma|3 we know that Au,v(Q) 
is monotonically non-decreasing with Qn,m- Thus, we have 

Au,v(Q') < Au.v(Q) < 0, (42) 

i.e., u dominates v at state Q'. By the definition of Qq, we 
can see that if Q G Qo, u dominates all v G if and v ^ u at 
state Q, i.e., n*{Q) = 0. We complete the proof of Property 
1 in Theorem [U 

Next, we prove Property 2 of Theorem [T] Consider BS 
n G JV, content m G Ain, multicast scheduling action 
u = (ui)i^^ G U where = m, and request queue state 
Q = € Q where Qn,m = •/'u (Q — n,—m)- 

Similar to the proof of Property 1 of Theorem [T] we only 
need to consider that < +oo. According to 

the definition of in (fTSl l. we can see that, 

Au,v(Q) < 0 for all V G if and v ^ u, i.e., /r*(Q) = u. 

• We first prove the first part of Property 2. Consider queue 

state = iQ}j)z^Ar,jGMi e Q where > Qn,m 

and Ql j = Qij for all {i,j) ^ {n,m). To prove the first 
part of Property 2, it is equivalent to show that /r*(Q^) = 
u. By Lemma |3] for all v G if and v yi u, we have 

Au.v(Q^) < Au,v(Q) < 0, (43) 

i.e., /r*(Q^) = u. We complete the proof of the first part 
in Property 2. 

• Then, we prove the second part of Property 2, i.e., the 
monotonically non-increasing property of 

in terms of Qn,m for all n G Afm- Consider another queue 
state G Q where > Qij if 

i G Afj and j = m, and Qij = Qij otherwise. To prove 
the second part of Property 2, it is equivalent to show that 
P*(Q2) = u. By Lemma 12 for all v G if and v 7 ^ u, 
we have 

Au,v(Q') < Au,v(Q) < 0, (44) 

i.e., p*(Q^) = u. We complete the proof of Property 2 
in Theorem [T] ■ 

Appendix E: Proof of Lemma|4] 

Along the line of the proof of Lemma 3 in f^, we 
prove the additive property of the value function. Note that, 
we have 5 (Q,u) = 

3n,m 

In addition, by the relation between the joint 
distribution and the marginal distribution, we have 

= Y.Q'^ Therefore, 

by substituting 9 = E„eA^ EmGAt„ 

^(Q) = ^n^J\f^mF M„ ^ n,m{Qn,m) info ( 122 J), We 

can see that the equality in ( 122 ) holds, which completes the 
proof. ■ 

Appendix F: Proof of Theorem [2] 

First, we show that for all n G Af and m € Ain, the per- 
BS-content value function Vn,m{Qn,m) satisfies 


for any ^ G Qn,m such that ^ Similar 

to the proof of Lemma |2] we define: 

•fntm(Qn.m) = u)] 

+ ^ E^[Pr[Q'^^JQn,m,n]]vlM,J, (46) 

where iA rn{Qn,m) denotes the per-BS-content value function 
in the Z-th iteration. For each Qn,m G Qn,m, RVIA calculates 
^nt^iQn.m) according to: 



where Ql^ ^ G Qn,m is some fixed state. Following the proof 
of Lemmalll we can prove that for any Qh^rnyQn,m G Qn,m 
such that Ql „, > Vn,m{Ql,m) > holds 

for all Z = 0,1, • • •. Thus, we can show that (142 holds 
through induction using RVIA. Then, by (l45l l. we can show 
that Au,v(Q) possesses the same monotonicity properties of 
Au.v(Q), following the proof of Lemma [3 Finally, by using 
the monotonicity properties of Au,v(Q), we can show the 
structural properties of jj*, following the proof of Theorem [T] 
We complete the proof. ■ 

References 

[1] B. Zhou, Y. Cui, and M. Tao, “Stochastic content-centric multicast 
scheduling for cache-enabled heterogeneous cellular networks,” in ACM 
CoNEXT2015 Workshop on CCDWN, Dec. 2015. 

[2] Cisco, “Cisco visual networking index: Global mobile data traffic 
forecast update, 2014-2019,” White Paper, Feb 2015. 

[3] A. Ghosh, N. Mangalvedhe, R. Ratasuk, B. Mondal, M. Cudak, E. Vi- 
sotsky, T. Thomas, J. Andrews, R Xia, H. Jo, H. Dhillon, and T. Novlan, 
“Heterogeneous cellulai* networks: From theoiy to practice,” IEEE 
Commim. Mag., vol. 50, no. 6, pp. 54—64, June 2012. 

[4] A. F. Molisch, G. Caire, D. Ott, J. R. Foerster, D. Bethanabhotla, 
and M. Ji, “Caching eliminates the wireless bottleneck in video aware 
wireless networks,” Advances in Electrical Engineering, vol. 2014, 2014. 

[5] E. Bastug, M. Bennis, and M. Debbah, “Living on the edge: The role 
of proactive caching in 5G wireless networks,” IEEE Commun. Mag., 
vol. 52, no. 8, pp. 82-89, Aug 2014. 

[6] G. Paschos, E. Ba§tug, I. Land, G. Caire, and M. Debbah, “Wireless 
caching: Technical misconceptions and business barriers,” arXiv preprint 
arXiv:1602.00173, 2016. 

[7] M. A. Maddah-Ali and U. Niesen, “Cache-aided interference channels,” 
in Proc. IEEE ISIT, June 2015, pp. 809-813. 

[8] K. Shanmugam, N. Golrezaei, A. Dimakis, A. Molisch, and G. Caire, 
“Femtocaching: Wireless content delivery through distributed caching 
helpers,” IEEE Trans. Inf. Theory, vol. 59, no. 12, Dec 2013. 

[9] M. Dehghan, A. Seetharam, B. Jiang, T. He, T. Salonidis, J. Kurose, 
D. Towsley, and R. Sitaraman, “On the complexity of optimal routing 
and content caching in heterogeneous networks,” in Proc. IEEE INFO- 
COM, April 2015. 

[10] A. Liu and V. Lau, “Cache-enabled opportunistic cooperative MIMO 
for video streaming in wireless systems,” IEEE Trans. Signal Process., 
vol. 62, no. 2, pp. 390^02, Jan 2014. 

[11] E. Bastug, M. Bennis, M. Kountouris, and M. Debbah, “Cache-enabled 
small cell networks: modeling and tradeoffs,” EURASIP Journal of 
Wireless Communications and Networking, vol. 2015, p. 1, 2015. 

[12] C. Yang, Y Yao, Z. Chen, and B. Xia, “Analysis on cache-enabled 
wireless heterogeneous networks,” IEEE Trans. Wireless Commun., 
vol. PP. no. 99, pp. 1-1, 2015. 

[13] D. Lecompte and E. Gabin, “Evolved multimedia broadcast/multicast 
seiwice (eMBMS) in LTE-advanced: overview and Rel-11 enhance¬ 
ments,” IEEE Commun. Mag., vol. 50, no. 11, pp. 68-74, 2012. 

[14] M. Maddah-Ali and U. Niesen, “Eundamental limits of caching,” IEEE 
Trans. Inf. Theory, vol. 60, no. 5, pp. 2856-2867, May 2014. 

[15] U. Niesen and M. Maddah-Ali, “Coded caching for delay-sensitive 
content,” in Proc. IEEE ICC, June 2015. 


14 


[16] K. Poulai'akis, G. losifidis, V. Sourlas, and L. Tassiulas, “Exploiting 
caching and multicast for 5G wireless networks,” IEEE Trans. Wireless 
Commun., vol. 15, no. 4, pp. 2995-3007, April 2016. 

[17] N. Abedini and S. Shakkottai, “Content caching and scheduling in 
wireless networks with elastic and inelastic traffic,” lEEE/ACM Trans. 
Netw., vol. 22, no. 3, pp. 864-874, June 2014. 

[18] B. Zhou, Y. Cui, and M. Tao, “Optimal dynamic multicast scheduling for 
cache-enabled content-centric wireless networks,” in Proc. IEEE ISIT, 
June 2015. 

[19] D. P. Bertsekas, Dynamic programming and optimal control, 3rd edition, 
volume II. Belmont, MA: Athena Scientific, 2011. 

[20] M. Shifrin, R. Atar, and I. Cidon, “Optimal scheduling in the hybrid- 
cloud,” in Proc. IFIP/IEEE International Symposium on Integrated 
Network Management (IM 2013), May 2013, pp. 51-59. 

[21] M. Shifrin, A. Cohen, O. Gurewitz, and O. Weisman, “Coded retransmis¬ 
sion in wireless networks via abstract MDPs: Theory and algorithms,” 
arXiv preprint arXiv:1502.02893, 2015. 

[22] E. Altman, R. El-Azouzi, D. S. Menasche, and Y. Xu, “Forever young: 
Aging control for smartphones in hybrid networks,” arXiv preprint 
arXiv:1009.4733, 2010. 

[23] C. H. Xia, G. Michailidis, N. Bambos, and P. W. Glynn, “Optimal control 
of parallel queues with batch service,” Probability in the Engineering 
and Informational Sciences, vol. 16, no. 03, pp. 289-307, 2002. 

[24] R. Gummadi, “Optimal control of a broadcasting server,” in Proc. 48th 
IEEE Conf. Decision Control (CDC/CCC), Dec. 2009. 

[25] D. V. Djonin and V. Krishnamurthy, “MIMO transmission control in 
fading channelsla constrained markov decision process formulation with 
monotone randomized policies,” IEEE Trans. Signal Process., vol. 55, 
no. 10, pp. 5069-5083, 2007. 

[26] A. H. Elwany, N. Z. Gebraeel, and L. M. Maillart, “Structured replace¬ 
ment policies for components with complex degradation processes and 
dedicated sensors,” Open Res., vol. 59, no. 3, pp. 684-695, 2011. 

[27] E. J. Rosensweig, D. S. Menasche, and J. Kurose, “On the steady-state 
of cache networks,” in Proc. IEEE INFOCOM, April 2013. 

[28] K. Deb, “Multi-objective optimization,” in Search methodologies. 
Springer, 2014, pp. 403-449. 

[29] R. A. Berry and R. G. Gallager, “Communication over fading channels 
with delay constraints,” IEEE Trans. Inf. Theory, vol. 48, no. 5, pp. 
1135-1149, 2002. 

[30] M. Levorato, U. Mitra, and M. Zorzi, “On optimal control of wireless 
networks with multiuser detection, hybrid ARQ and distortion con¬ 
straints,” in Proc. IEEE INFOCOM, April 2009. 

[31] G. Koole, “Monotonicity in mai'kov reward and decision chains: Theory 
and applications,” Foundations and Trends in Stochastic Systems, vol. 1, 
no. 1, pp. 1-76, 2006. 

[32] M. L. Puterman, Markov decision processes: discrete stochastic dynamic 
programming. New York, NY, USA: Wiley, 2009, vol. 414. 

[33] Y. Cui, V. Lau, and Y. Wu, “Delay-aware BS discontinuous transmission 
control and user scheduling for energy haiwesting downlink coordinated 
MIMO systems,” IEEE Trans. Signal Process., vol. 60, no. 7, pp. 3786- 
3795, July 2012. 

[34] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker, “Web caching 
and Zipf-like distributions: evidence and implications,” in Proc. IEEE 
INFOCOM, March 1999. 

[35] W. B. Powell, Approximate Dynamic Programming: Solving the curses 
of dimensionality. John Wiley & Sons, 2007, vol. 703. 


